Building Corpora for the Development of a Dependency Parser for Spanish Using Maltparser

نویسندگان

  • Jesús Herrera
  • Pablo Gervás
  • Pedro J. Moriano
  • Alfonso Muñoz
  • Luis Romero
چکیده

The present paper details the process followed for creating training and test corpora for a dependency parser generator (Maltparser). The starting point is the Cast3LB corpus, which contains constituency analyses of Spanish texts. These constituency analyses are automatically transformed into dependency analyses. In addition, the empirically and semiautomatically obtention of a set of syntactic function labels for the training corpus is described. As a result of the process followed, it has been obtained a dependency parser for Spanish showing a 91% precision when determining dependencies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Rule-Based and Data-Driven Dependency Parsing of Learner Language

We explore the performance of two dependency parsing approaches, the rulebased WCDG approach (Foth and Menzel 2006) and the data-driven dependency parser MaltParser (Nivre et al. 2007) on texts written by language learners. We show that WCDG outperforms MaltParser in identifying the main functorargument relations, whereas MaltParser is more successful than WCDG in establishing optional, adjunct...

متن کامل

MaltParser: A Data-Driven Parser-Generator for Dependency Parsing

We introduce MaltParser, a data-driven parser generator for dependency parsing. Given a treebank in dependency format, MaltParser can be used to induce a parser for the language of the treebank. MaltParser supports several parsing algorithms and learning algorithms, and allows user-defined feature models, consisting of arbitrary combinations of lexical features, part-of-speech features and depe...

متن کامل

Improving parsing Accuracy for Spanish using Maltparser∗ Mejora de la Precisión del Análisis para el Español con Maltparser

In the last years, dependency parsing has been accomplished by machine learning–based systems showing great accuracy but usually under 90% for Labelled Attachment Score (LAS). Maltparser is one of such systems. Machine learning allows to obtain parsers for every language having an adequate training corpus. Since generally such systems can not be modified the following question arises: Can we be...

متن کامل

A Data-Driven Dependency Parser for Bulgarian

One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representa...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2007